stable diffusionで下絵から画像を出す

https://github.com/CompVis/stable-diffusion#image-modification-with-stable-diffusion

img2img(stable diffusion)

こういうやつがやりたい

@8co28: ＃stablediffusion の＃Img2Img (指定画像から画像を生成する)機能を使ってみました。

3分で描いた指示用雑絵(2枚目)に絵の要素のプロンプトを指示し1枚目を生成しました。

2枚とも生成時・指示時のもので、無編集。

いや、すごい……

https://pbs.twimg.com/media/Fa6n5_SagAA0h10.pnghttps://pbs.twimg.com/media/Fa6oOnIaQAcUSCy.jpg

@8co28: ＃Img2Img 指示絵失敗備忘録

線と色が多い絵で手がある(1枚目)と当然上手くいかない(2枚目)

パーツ指示が簡単な手がない絵(3枚目)だと割と良い結果が出る(4枚目)

1枚目3枚目は自前の過去絵

すでに出来上がってる絵を指示絵に使うのはまだ現実的じゃなさそう(簡略化した指示絵を作る必要がある)

https://pbs.twimg.com/media/Fa-4lhdaQAcKVN9.jpghttps://pbs.twimg.com/media/Fa-_NDuaAAAkg24.pnghttps://pbs.twimg.com/media/Fa_C5mUaIAUBMhF.jpghttps://pbs.twimg.com/media/Fa_C-LtaUAEOa3-.png

環境構築

Stable diffusionのimg2imgをGTX1070（VRAM 8GB）で使う

https://gyazo.com/94feaf296290d210485e8847a5086cfa

code:zsh

python scripts/img2img.py --prompt "long haired girl is standing on the moon. Her arms are crossed and she is looking at us with a smile. Behind her is the earth in space. makoto shinkai style." --init-img img/mito3.jpg --strength 0.9 --n_samples 2 --n_iter 2

img2imgは n_tier 2で行数を2行にできる

https://gyazo.com/930fdda782c464f3b64739bf01e9f7d6

strength 0.9

右下は背景としては意図通り

https://gyazo.com/655a1ccdf65d33ed6dfb7e553644e1c5https://gyazo.com/d0ce4286308dae3ffe028120810a2ee3

strength 0.8 / 0.7

0.8

右下、リムライトがいいね

0.7

左上、構図はアイレベルも意図通り

https://gyazo.com/90bfd310ef4b26fb82a7dff4599d96c8https://gyazo.com/8006c4d3942c470fc4633ba2c31b98e2

0.6 / 0.5

0.6

上段が良い

https://gyazo.com/8b377ee394b8cb9ae866c9ee9fbcc8echttps://gyazo.com/94b2255df7a73114cbfdee3b909cd785

0.4/0.3

かなりもとの構図に似てきた。人は書き込まれてるけどお得意の背景が生きてない

https://gyazo.com/f656a24b49d68266ba8ad41200fbc3e6https://gyazo.com/883026126ee658cd5cb87b1dfbe02b64

0.2 / 0.1

生成が早い

元の絵と殆ど変わらないのでやる価値がない

考察

～0.2を使うことはない。

0.4程度でも書き込みの情報量は増えるので、ラフを書いて少し加筆してほしいときにつかえるかもしれない

0.5ごろから下絵と明らかに違うものが出てくる

0.6のこれは気に入った

https://gyazo.com/9a22cc49610910a6689c7a184ecde548

0.8以降になるとAIが好きに書き始めて意外性が出る

0.8のこれは完全に異質

https://gyazo.com/d3130b4937ef3be323b44405bab14caa

構図が違うがズームのこういう感じもいいかも。構図のバリエーション出してくれるのは一枚絵を書くときのブレストに使えるかも

https://gyazo.com/e05de5b9a43d1638cd3870a0fdf01c57https://gyazo.com/680da84ed26cf4bcd2273d2ef72ac438

ブレストだとAI二重に書かせるような大きな重み（promptの重みが大きくなる）0.9がよい

https://gyazo.com/930fdda782c464f3b64739bf01e9f7d6

右上は設定が変わってしまっているがこれはこれで2000年代ラノベ表紙みたいな感じだし

右下も「さよなら地球（テラ）」みたいなタイトルの小説の表紙っぽいし

野暮ったい普段着と、頭を抑えているのにストーリーを感じるね

左下は手塚治虫の漫画にありそうだ（なんとなく）。自分は左下のような絵を絶対に描かないと思うから面白い

0.6ぐらいが使いやすいという意見

@Pretty_Mundane: ちょっと補足

最初にAIに読み込ませるイメージはラフなほうが良い出力結果が出やすいと感じました

シルエットと大まかなライティング、色合いくらいまで描き込んでStrength 0.6で出力すると、良い感じにニュアンスを汲み取ってくれる

--.icon

code:zsh

python scripts/img2img.py --prompt "VTuber Tsukino Mito is standing on the moon. Her arms are crossed and she is looking at us with a smile full of herself. Behind her is the bright blue earth. Eye level is her knee. This is an animation so it is in 3D. The background is the earth with the moon in the sky" --init-img img/mito2.jpg --strength 0.8

https://github.com/CompVis/stable-diffusion

strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.

strength は 0.0 から 1.0 の間の値で、入力画像に加えられるノイズの量を制御します。1.0に近い値では、多くのバリエーションが可能になりますが、入力と意味的に一致しない画像も生成されます。

https://gyazo.com/94feaf296290d210485e8847a5086cfa https://gyazo.com/7c4c9c0283c4d379171b09b02990716f

左：下絵　右：生成物。悪くはないけどいまいち！アイレベルが違う。

n_samples 2でもいけたが

n-samples = 2にしても2x2にならないしrowsを指定しても2x2にならない

code:zsh

https://gyazo.com/bec8139e35a19cdc85dafc0418ed2d3a

コードを読んだらn_iterでループしてたので指定したら行増えた

Macだとできないらしい

@s_ryuuki: >resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

warnings.warn('resource_tracker: There appear to be %d '

macだとダメだった。

👀 stable diffusionで下絵から画像を出す - 基素基 https://t.co/UGhLSe7Mzg

公式はM1/M2対応は後からで、NVidiaのGPUを推奨している

/villagepump/@kidooom#63045086774b170000e03874